AITopics | global convergence

Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent

Neural Information Processing SystemsApr-30-2026, 06:24:27 GMT

Low-rank matrix factorization (LRMF) is a canonical problem in non-convex optimization, the objective function to be minimized is non-convex and even non-smooth, which makes the global convergence guarantee of gradient-based algorithm quite challenging.

artificial intelligence, initialization, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)

Add feedback

Stochastic Mirror Descent in Variationally Coherent Optimization Problems

Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Stephen Boyd, Peter W. Glynn

Neural Information Processing SystemsApr-23-2026, 23:15:30 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE

Chaudhry, Faris

arXiv.org Machine LearningMar-16-2026

The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain poorly understood. We provide a theoretical analysis by modeling embedding evolution under Langevin dynamics on a compact Riemannian manifold. Under mild smoothness and energy-barrier assumptions, we show that classical simulated annealing guarantees extend to this setting: slow logarithmic inverse-temperature schedules ensure convergence in probability to a set of globally optimal representations, while faster schedules risk becoming trapped in suboptimal minima. Our results establish a link between contrastive learning and simulated annealing, providing a principled basis for understanding and tuning temperature schedules.

artificial intelligence, convergence, machine learning, (18 more...)

arXiv.org Machine Learning

2603.12552

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

f02f1185b97518ab5bd7ebde466992d3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 21:40:35 GMT

artificial intelligence, initialization, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Macao (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Add feedback

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Neural Information Processing SystemsFeb-17-2026, 20:41:06 GMT

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(7 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the Convergence of Encoder-only Shallow Transformers

Neural Information Processing SystemsFeb-16-2026, 07:01:55 GMT

Besides, neural tangent kernel (NTK) based analysis is also given, which facilitates a comprehensive comparison. Our theory demonstrates the separation on the importance of different scaling schemes and initialization.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: